{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Natural Language Inference on XNLI Dataset using BERT with Azure Machine Learning\n",
    "\n",
    "## 1. Summary\n",
    "In this notebook, we demonstrate how to fine-tune BERT using distributed training (Horovod) on Azure Machine Learning service to do language inference in English. We use the [XNLI](https://github.com/facebookresearch/XNLI) dataset and to classify sentence pairs into three classes: contradiction, entailment, and neutral.   \n",
    "\n",
    "The figure below shows how [BERT](https://arxiv.org/abs/1810.04805) classifies sentence pairs. It concatenates the tokens in each sentence pairs and separates the sentences by the [SEP] token. A [CLS] token is prepended to the token list and used as the aggregate sequence representation for the classification task.\n",
    "<img src=\"https://nlpbp.blob.core.windows.net/images/bert_two_sentence.PNG\">\n",
    "\n",
    "**Note: To learn how to do pre-training on your own, please reference the [AzureML-BERT repo](https://github.com/microsoft/AzureML-BERT) created by Microsoft.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/entailment/entailment_xnli_bert_azureml.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Imports\n",
    "\n",
    "import sys\n",
    "\n",
    "sys.path.append(\"../..\")\n",
    "\n",
    "import os\n",
    "import shutil\n",
    "import torch\n",
    "import json\n",
    "import pandas as pd\n",
    "\n",
    "import azureml.core\n",
    "from azureml.train.dnn import PyTorch\n",
    "from azureml.core.runconfig import MpiConfiguration\n",
    "from azureml.core import Experiment\n",
    "from azureml.widgets import RunDetails\n",
    "from azureml.core.compute import ComputeTarget, AmlCompute\n",
    "from azureml.exceptions import ComputeTargetException\n",
    "from utils_nlp.azureml.azureml_utils import get_or_create_workspace, get_output_files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# Parameters\n",
    "\n",
    "DEBUG = True\n",
    "NODE_COUNT = 4\n",
    "NUM_PROCESS = 1\n",
    "DATA_PERCENT_USED = 1.0\n",
    "\n",
    "config_path = (\n",
    "    \"./.azureml\"\n",
    ")  # Path to the directory containing config.json with azureml credentials\n",
    "\n",
    "# Azure resources\n",
    "subscription_id = \"YOUR_SUBSCRIPTION_ID\"\n",
    "resource_group = \"YOUR_RESOURCE_GROUP_NAME\"  \n",
    "workspace_name = \"YOUR_WORKSPACE_NAME\"  \n",
    "workspace_region = \"YOUR_WORKSPACE_REGION\"  # eg: eastus, eastus2.\n",
    "cluster_name = \"gpu-entail\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. AzureML Setup\n",
    "\n",
    "### 2.1 Initialize a Workspace\n",
    "\n",
    "The following cell looks to set up the connection to your [Azure Machine Learning service Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace). You can choose to connect to an existing workspace or create a new one. \n",
    "\n",
    "**To access an existing workspace:**\n",
    "1. If you have a `config.json` file, you do not need to provide the workspace information; you will only need to update the `config_path` variable that is defined above which contains the file.\n",
    "2. Otherwise, you will need to supply the following:\n",
    "    * The name of your workspace\n",
    "    * Your subscription id\n",
    "    * The resource group name\n",
    "\n",
    "**To create a new workspace:**\n",
    "\n",
    "Set the following information:\n",
    "* A name for your workspace\n",
    "* Your subscription id\n",
    "* The resource group name\n",
    "* [Azure region](https://azure.microsoft.com/en-us/global-infrastructure/regions/) to create the workspace in, such as `eastus2`. \n",
    "\n",
    "This will automatically create a new resource group for you in the region provided if a resource group with the name given does not already exist. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "ws = get_or_create_workspace(\n",
    "    config_path=config_path,\n",
    "    subscription_id=subscription_id,\n",
    "    resource_group=resource_group,\n",
    "    workspace_name=workspace_name,\n",
    "    workspace_region=workspace_region,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\n",
    "    \"Workspace name: \" + ws.name,\n",
    "    \"Azure region: \" + ws.location,\n",
    "    \"Subscription id: \" + ws.subscription_id,\n",
    "    \"Resource group: \" + ws.resource_group,\n",
    "    sep=\"\\n\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 Link AmlCompute Compute Target\n",
    "\n",
    "We need to link a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training our model (see [compute options](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#supported-compute-targets) for explanation of the different options). We will use an [AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) target and link to an existing target (if the cluster_name exists) or create a STANDARD_NC6 GPU cluster (autoscales from 0 to 4 nodes) in this example. Creating a new AmlComputes takes approximately 5 minutes. \n",
    "\n",
    "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found compute target: gpu-entail\n",
      "{'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-08-03T13:43:20.068000+00:00', 'errors': None, 'creationTime': '2019-07-27T02:14:46.127092+00:00', 'modifiedTime': '2019-07-27T02:15:07.181277+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6S_V2'}\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
    "    print(\"Found compute target: {}\".format(cluster_name))\n",
    "except ComputeTargetException:\n",
    "    print(\"Creating new compute target: {}\".format(cluster_name))\n",
    "    compute_config = AmlCompute.provisioning_configuration(\n",
    "        vm_size=\"STANDARD_NC6\", max_nodes=NODE_COUNT\n",
    "    )\n",
    "    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
    "    compute_target.wait_for_completion(show_output=True)\n",
    "\n",
    "\n",
    "print(compute_target.get_status().serialize())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'./entail_utils\\\\utils_nlp'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "project_dir = \"./entail_utils\"\n",
    "if DEBUG and os.path.exists(project_dir):\n",
    "    shutil.rmtree(project_dir)\n",
    "shutil.copytree(\"../../utils_nlp\", os.path.join(project_dir, \"utils_nlp\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Prepare Training Script"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing ./entail_utils/train.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile $project_dir/train.py\n",
    "import horovod.torch as hvd\n",
    "import torch\n",
    "import numpy as np\n",
    "import time\n",
    "import argparse\n",
    "from utils_nlp.common.timer import Timer\n",
    "from utils_nlp.dataset.xnli_torch_dataset import XnliDataset\n",
    "from utils_nlp.models.bert.common import Language\n",
    "from utils_nlp.models.bert.sequence_classification_distributed import (\n",
    "    BERTSequenceClassifier,\n",
    ")\n",
    "from sklearn.metrics import classification_report\n",
    "\n",
    "print(\"Torch version:\", torch.__version__)\n",
    "\n",
    "hvd.init()\n",
    "\n",
    "LANGUAGE_ENGLISH = \"en\"\n",
    "TRAIN_FILE_SPLIT = \"train\"\n",
    "TEST_FILE_SPLIT = \"test\"\n",
    "TO_LOWERCASE = True\n",
    "PRETRAINED_BERT_LNG = Language.ENGLISH\n",
    "LEARNING_RATE = 5e-5\n",
    "WARMUP_PROPORTION = 0.1\n",
    "BATCH_SIZE = 32\n",
    "NUM_GPUS = 1\n",
    "OUTPUT_DIR = \"./outputs/\"\n",
    "LABELS = [\"contradiction\", \"entailment\", \"neutral\"]\n",
    "\n",
    "## each machine gets it's own copy of data\n",
    "CACHE_DIR = \"./xnli-%d\" % hvd.rank()\n",
    "\n",
    "parser = argparse.ArgumentParser()\n",
    "# Training settings\n",
    "parser.add_argument(\n",
    "    \"--seed\", type=int, default=42, metavar=\"S\", help=\"random seed (default: 42)\"\n",
    ")\n",
    "parser.add_argument(\n",
    "    \"--epochs\", type=int, default=2, metavar=\"S\", help=\"random seed (default: 2)\"\n",
    ")\n",
    "parser.add_argument(\n",
    "    \"--no-cuda\", action=\"store_true\", default=False, help=\"disables CUDA training\"\n",
    ")\n",
    "parser.add_argument(\n",
    "    \"--data_percent_used\",\n",
    "    type=float,\n",
    "    default=1.0,\n",
    "    metavar=\"S\",\n",
    "    help=\"data percent used (default: 1.0)\",\n",
    ")\n",
    "\n",
    "args = parser.parse_args()\n",
    "args.cuda = not args.no_cuda and torch.cuda.is_available()\n",
    "\n",
    "\"\"\"\n",
    "Note: For example, you have 4 nodes and 4 GPUs each node, so you spawn 16 workers. \n",
    "Every worker will have a rank [0, 15], and every worker will have a local_rank [0, 3]\n",
    "\"\"\"\n",
    "if args.cuda:\n",
    "    torch.cuda.set_device(hvd.local_rank())\n",
    "    torch.cuda.manual_seed(args.seed)\n",
    "\n",
    "# num_workers - this is equal to number of gpus per machine\n",
    "kwargs = {\"num_workers\": NUM_GPUS, \"pin_memory\": True} if args.cuda else {}\n",
    "\n",
    "train_dataset = XnliDataset(\n",
    "    file_split=TRAIN_FILE_SPLIT,\n",
    "    cache_dir=CACHE_DIR,\n",
    "    language=LANGUAGE_ENGLISH,\n",
    "    to_lowercase=TO_LOWERCASE,\n",
    "    tok_language=PRETRAINED_BERT_LNG,\n",
    "    data_percent_used=args.data_percent_used,\n",
    ")\n",
    "\n",
    "\n",
    "# set the label_encoder for evaluation\n",
    "label_encoder = train_dataset.label_encoder\n",
    "num_labels = len(np.unique(train_dataset.labels))\n",
    "\n",
    "# Train\n",
    "classifier = BERTSequenceClassifier(\n",
    "    language=Language.ENGLISH,\n",
    "    num_labels=num_labels,\n",
    "    cache_dir=CACHE_DIR,\n",
    "    use_distributed=True,\n",
    ")\n",
    "\n",
    "\n",
    "train_loader = classifier.create_data_loader(\n",
    "    train_dataset, BATCH_SIZE, mode=\"train\", **kwargs\n",
    ")\n",
    "\n",
    "\n",
    "num_samples = len(train_loader.dataset)\n",
    "num_batches = int(num_samples / BATCH_SIZE)\n",
    "num_train_optimization_steps = num_batches * args.epochs\n",
    "optimizer = classifier.create_optimizer(\n",
    "    num_train_optimization_steps, lr=LEARNING_RATE, warmup_proportion=WARMUP_PROPORTION\n",
    ")\n",
    "\n",
    "with Timer() as t:\n",
    "    for epoch in range(1, args.epochs + 1):\n",
    "\n",
    "        # to allow data shuffling for DistributedSampler\n",
    "        train_loader.sampler.set_epoch(epoch)\n",
    "\n",
    "        # epoch and num_epochs is passed in the fit function to print loss at regular batch intervals\n",
    "        classifier.fit(\n",
    "            train_loader,\n",
    "            epoch=epoch,\n",
    "            num_epochs=args.epochs,\n",
    "            bert_optimizer=optimizer,\n",
    "            num_gpus=NUM_GPUS,\n",
    "        )\n",
    "\n",
    "#if machine has multiple gpus then run predictions on only on 1 gpu since test_dataset is small.\n",
    "if hvd.rank() == 0:\n",
    "    NUM_GPUS = 1\n",
    "    \n",
    "    test_dataset = XnliDataset(\n",
    "        file_split=TEST_FILE_SPLIT,\n",
    "        cache_dir=CACHE_DIR,\n",
    "        language=LANGUAGE_ENGLISH,\n",
    "        to_lowercase=TO_LOWERCASE,\n",
    "        tok_language=PRETRAINED_BERT_LNG,\n",
    "    )\n",
    "\n",
    "    test_loader = classifier.create_data_loader(test_dataset, mode=\"test\")\n",
    "\n",
    "    # predict\n",
    "    predictions, pred_labels = classifier.predict(test_loader, NUM_GPUS)\n",
    "\n",
    "    predictions = label_encoder.inverse_transform(predictions)\n",
    "\n",
    "    # Evaluate\n",
    "    results = classification_report(\n",
    "        pred_labels, predictions, target_names=LABELS, output_dict=True\n",
    "    )\n",
    "\n",
    "    result_file = os.path.join(OUTPUT_DIR, \"results.json\")\n",
    "    with open(result_file, \"w+\") as fp:\n",
    "        json.dump(results, fp)\n",
    "\n",
    "    # save model\n",
    "    classifier.save_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Create a PyTorch Estimator\n",
    "\n",
    "BERT is built on PyTorch, so we will use the AzureML SDK's PyTorch estimator to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, see [How to Train Pytorch Models on AzureML](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "mpiConfig = MpiConfiguration()\n",
    "mpiConfig.process_count_per_node = NUM_PROCESS\n",
    "\n",
    "script_params = {\n",
    "    '--data_percent_used': DATA_PERCENT_USED\n",
    "}\n",
    "\n",
    "est = PyTorch(\n",
    "    source_directory=project_dir,\n",
    "    compute_target=compute_target,\n",
    "    entry_script=\"train.py\",\n",
    "    script_params = script_params,\n",
    "    node_count=NODE_COUNT,\n",
    "    distributed_training=mpiConfig,\n",
    "    use_gpu=True,\n",
    "    framework_version=\"1.0\",\n",
    "    conda_packages=[\"scikit-learn=0.20.3\", \"numpy\", \"spacy\", \"nltk\"],\n",
    "    pip_packages=[\"pandas\", \"seqeval[gpu]\", \"pytorch-pretrained-bert\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Create Experiment and Submit a Job\n",
    "Submit the estimator object to run your experiment. Results can be monitored using a Jupyter widget. The widget and run are asynchronous and update every 10-15 seconds until job completion.\n",
    "\n",
    "**Note**: The experiment takes ~4 hours with 2 NC24 nodes and ~7hours with 4 NC6 nodes. The overhead is due to the communication time between nodes.    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = Experiment(ws, name=\"NLP-Entailment-BERT\")\n",
    "run = experiment.submit(est)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c8e7a44fa8804e95b21eea74d7694b1e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "RunDetails(run).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the above cell is an async call, the below cell is a blocking call to stop the cells below it to execute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run.wait_for_completion()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. Analyze Results\n",
    "\n",
    "Download result.json from portal and open to view results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading file outputs/results.json to ./outputs\\results.json...\n"
     ]
    }
   ],
   "source": [
    "file_names = [\"outputs/results.json\"]\n",
    "get_output_files(run, \"./outputs\", file_names=file_names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "               f1-score  precision    recall  support\n",
      "contradiction  0.838749   0.859296  0.819162   1670.0\n",
      "entailment     0.817280   0.877663  0.764671   1670.0\n",
      "neutral        0.777870   0.719817  0.846108   1670.0\n",
      "micro avg      0.809980   0.809980  0.809980   5010.0\n",
      "macro avg      0.811300   0.818925  0.809980   5010.0\n",
      "weighted avg   0.811300   0.818925  0.809980   5010.0\n"
     ]
    }
   ],
   "source": [
    "with open(\"outputs/results.json\", \"r\") as handle:\n",
    "    parsed = json.load(handle)\n",
    "    print(pd.DataFrame.from_dict(parsed).transpose())"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python (nlp_gpu_transformer_bug_bash)",
   "language": "python",
   "name": "nlp_gpu_transformer_bug_bash"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
